17 research outputs found

    Horizon-Independent Optimal Prediction with Log-Loss in Exponential Families

    Full text link
    We study online learning under logarithmic loss with regular parametric models. Hedayati and Bartlett (2012b) showed that a Bayesian prediction strategy with Jeffreys prior and sequential normalized maximum likelihood (SNML) coincide and are optimal if and only if the latter is exchangeable, and if and only if the optimal strategy can be calculated without knowing the time horizon in advance. They put forward the question what families have exchangeable SNML strategies. This paper fully answers this open problem for one-dimensional exponential families. The exchangeability can happen only for three classes of natural exponential family distributions, namely the Gaussian, Gamma, and the Tweedie exponential family of order 3/2. Keywords: SNML Exchangeability, Exponential Family, Online Learning, Logarithmic Loss, Bayesian Strategy, Jeffreys Prior, Fisher Information1Comment: 23 page

    Learning to Crawl

    Full text link
    Web crawling is the problem of keeping a cache of webpages fresh, i.e., having the most recent copy available when a page is requested. This problem is usually coupled with the natural restriction that the bandwidth available to the web crawler is limited. The corresponding optimization problem was solved optimally by Azar et al. [2018] under the assumption that, for each webpage, both the elapsed time between two changes and the elapsed time between two requests follow a Poisson distribution with known parameters. In this paper, we study the same control problem but under the assumption that the change rates are unknown a priori, and thus we need to estimate them in an online fashion using only partial observations (i.e., single-bit signals indicating whether the page has changed since the last refresh). As a point of departure, we characterise the conditions under which one can solve the problem with such partial observability. Next, we propose a practical estimator and compute confidence intervals for it in terms of the elapsed time between the observations. Finally, we show that the explore-and-commit algorithm achieves an O(T)\mathcal{O}(\sqrt{T}) regret with a carefully chosen exploration horizon. Our simulation study shows that our online policy scales well and achieves close to optimal performance for a wide range of the parameters.Comment: Published at AAAI 202

    Open problem: Fast and optimal online portfolio selection

    Get PDF
    Online portfolio selection has received much attention in the COLT community since its introduction by Cover, but all state-of-the-art methods fall short in at least one of the following ways: they are either i) computationally infeasible; or ii) they do not guarantee optimal regret; or iii) they assume the gradients are bounded, which is unnecessary and cannot be guaranteed. We are interested in a natural follow-the-regularized-leader (FTRL) approach based on the log barrier regularizer, which is computationally feasible. The open problem we put before the community is to formally prove whether this approach achieves the optimal regret. Resolving this question will likely lead to new techniques to analyse FTRL algorithms. There are also interesting technical connections to self-concordance, which has previously been used in the context of bandit convex optimization

    Quantum learning: asymptotically optimal classification of qubit states

    No full text
    Pattern recognition is a central topic in learning theory, with numerous applications such as voice and text recognition, image analysis and computer diagnosis. The statistical setup in classification is the following: we are given an i.i.d. training set (X1, Y1),...,(Xn, Yn), where X i represents a feature and Y in {0, 1} is a label attached to that feature. The underlying joint distribution of (X, Y) is unknown, but we can learn about it from the training set, and we aim at devising low error classifiers f: X→Y used to predict the label of new incoming features. In this paper, we solve a quantum analogue of this problem, namely the classification of two arbitrary unknown mixed qubit states. Given a number of 'training' copies from each of the states, we would like to 'learn' about them by performing a measurement on the training set. The outcome is then used to design measurements for the classification of future systems with unknown labels. We found the asymptotically optimal classification strategy and show that typically it performs strictly better than a plug-in strategy, which consists of estimating the states separately and then discriminating between them using the Helstrom measurement. The figure of merit is given by the excess risk equal to the difference between the probability of error and the probability of error of the optimal measurement for known states. We show that the excess risk scales as n^{–1} and compute the exact constant of the rate

    Following the Flattened Leader

    No full text
    We analyze the regret, measured in terms of log loss, of the maximum likelihood (ML) sequential prediction strategy. This “follow the leader ” strategy also defines one of the main versions of Minimum Description Length model selection. We proved in prior work for single parameter exponential family models that (a) in the misspecified case, the redundancy of follow-the-leader is not 1 2 log n+O(1), as it is for other universal prediction strategies; as such, the strategy also yields suboptimal individual sequence regret and inferior model selection performance; and (b) that in general it is not possible to achieve the optimal redundancy when predictions are constrained to the distributions in the considered model. Here we describe a simple “flattening” of the sequential ML and related predictors, that does achieve the optimal worst case individual sequence regret of (k/2)log n + O(1) for k parameter exponential family models for bounded outcome spaces; for unbounded spaces, we provide almost-sure results. Simulations show a major improvement of the resulting model selection criterion

    Maximum Likelihood vs. Sequential Normalized Maximum Likelihood in On-line Density Estimation

    No full text
    The paper considers sequential prediction of individual sequences with log loss (online density estimation) using an exponential family of distributions. We first analyze the regret of the maximum likelihood ("follow the leader") strategy. We find that this strategy is (1) suboptimal and (2) requires an additional assumption about boundedness of the data sequence. We then show that both problems can be be addressed by adding the currently predicted outcome to the calculation of the maximum likelihood, followed by normalization of the distribution. The strategy obtained in this way is known in the literature as the sequential normalized maximum likelihood or last-step minimax strategy. We show for the first time that for general exponential families, the regret is bounded by the familiar (k/2) log(n) and thus optimal up to O(1). We also show the relationship to the Bayes strategy with Jeffreys' prior

    Bipartite Ranking through Minimization of Univariate Loss

    No full text
    Minimization of the rank loss or, equivalently, maximization of the AUC in bipartite ranking calls for minimizing the number of disagreements between pairs of instances. Since the complexity of this problem is inherently quadratic in the number of training examples, it is tempting to ask how much is actually lost by minimizing a simple univariate loss function, as done by standard classification methods, as a surrogate. In this paper, we first note that minimization of 0/1 loss is not an option, as it may yield an arbitrarily high rank loss. We show, however, that better results can be achieved by means of a weighted (cost-sensitive) version of 0/1 loss. Yet, the real gain is obtained through margin-based loss functions, for which we are able to derive proper bounds, not only for rank risk but, more importantly, also for rank regret. The paper is completed with an experimental study in which we address specific questions raised by our theoretical analysis

    Learning Eigenvectors for Free

    No full text
    We extend the classical problem of predicting a sequence of outcomes from a finite alphabet to the matrix domain. In this extension, the alphabet of n outcomes is replaced by the set of all dyads, i.e. outer products uu^T where u is a vector in R^n of unit length. Whereas in the classical case the goal is to learn (i.e. sequentially predict as well as) the best multinomial distribution, in the matrix case we desire to learn the density matrix that best explains the observed sequence of dyads. We show how popular online algorithms for learning a multinomial distribution can be extended to learn density matrices. Intuitively, learning the n^2 parameters of a density matrix is much harder than learning the n parameters of a multinomial distribution. Completely surprisingly, we prove that the worst-case regrets of certain classical algorithms and their matrix generalizations are identical. The reason is that the worstcase sequence of dyads share a common eigensystem, i.e. the worst case regret is achieved in the classical case. So these matrix algorithms learn the eigenvectors without any regret
    corecore